Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience

نویسندگان

  • Jyotishman Pathak
  • Janey Wang
  • Sudha Kashyap
  • Melissa A. Basford
  • Rongling Li
  • Daniel R. Masys
  • Christopher G. Chute
چکیده

BACKGROUND Systematic study of clinical phenotypes is important for a better understanding of the genetic basis of human diseases and more effective gene-based disease management. A key aspect in facilitating such studies requires standardized representation of the phenotype data using common data elements (CDEs) and controlled biomedical vocabularies. In this study, the authors analyzed how a limited subset of phenotypic data is amenable to common definition and standardized collection, as well as how their adoption in large-scale epidemiological and genome-wide studies can significantly facilitate cross-study analysis. METHODS The authors mapped phenotype data dictionaries from five different eMERGE (Electronic Medical Records and Genomics) Network sites studying multiple diseases such as peripheral arterial disease and type 2 diabetes. For mapping, standardized terminological and metadata repository resources, such as the caDSR (Cancer Data Standards Registry and Repository) and SNOMED CT (Systematized Nomenclature of Medicine), were used. The mapping process comprised both lexical (via searching for relevant pre-coordinated concepts and data elements) and semantic (via post-coordination) techniques. Where feasible, new data elements were curated to enhance the coverage during mapping. A web-based application was also developed to uniformly represent and query the mapped data elements from different eMERGE studies. RESULTS Approximately 60% of the target data elements (95 out of 157) could be mapped using simple lexical analysis techniques on pre-coordinated terms and concepts before any additional curation of terminology and metadata resources was initiated by eMERGE investigators. After curation of 54 new caDSR CDEs and nine new NCI thesaurus concepts and using post-coordination, the authors were able to map the remaining 40% of data elements to caDSR and SNOMED CT. A web-based tool was also implemented to assist in semi-automatic mapping of data elements. CONCLUSION This study emphasizes the requirement for standardized representation of clinical research data using existing metadata and terminology resources and provides simple techniques and software for data element mapping using experiences from the eMERGE Network.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Phenotypic Data Elements for Genetics and Epidemiological Research: Experiences from the eMERGE and PhenX Network Projects

Combining genome-wide association studies (GWAS) data with clinical information from the electronic medical record (EMR) provide unprecedented opportunities to identify genetic variants that influence susceptibility to common, complex diseases. While mining the vastness of EMR greatly expands the potential for conducting GWAS, non-standardized representation and wide variability of clinical dat...

متن کامل

ترسیم نقشه دانش حوزه کتابخانه‌های دیجیتالی در ایران: تحلیل هم‌رخدادی واژگان

This study aimed to knowledge mapping of Digital Libraries (DLs) field in Iran. This is a scientometrics study. In this regard, Social Network and co-word analysis methods were used. 554 research resources such as books, national and international journal papers, conferences articles, and MA and Ph.D. Theses in Iran up to 2013 were studied. Researcher made checklist was used to collext data. Al...

متن کامل

A national action plan for sharable and comparable nursing data to support practice and translational research for transforming health care

BACKGROUND There is wide recognition that, with the rapid implementation of electronic health records (EHRs), large data sets are available for research. However, essential standardized nursing data are seldom integrated into EHRs and clinical data repositories. There are many diverse activities that exist to implement standardized nursing languages in EHRs; however, these activities are not co...

متن کامل

Standardized patients versus simulated patients in medical education: are they the same or different

In order to equip medical students with all the necessary skills in dealing with patients to provide optimal treatment, the need for the use of real patients in educational settings has become prominent. But all the required skills cannot be practiced on real patients due to patients’ safety and well-being. Thus, the use of standardized patients (SPs) or simulated patients (SiPs) as a substitut...

متن کامل

Design and Implementation of a Comprehensive Database of the Written Heritage of Science and Technology

Purpose: This study aims to design and implement a comprehensive database of the written heritage of science and technology in the Regional Information Center for Science and Technology (RICeST) and determine the metadata elements required to describe the manuscripts. Method: This study was carried out by the content analysis method to identify the metadata elements needed to describe the coll...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 18 4  شماره 

صفحات  -

تاریخ انتشار 2011